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ABSTRACT 

The  problem  of  robust  sequential  discrimination  from  two  dependent  observation 
sequences  with  uncertain  statistics  is  addressed.  As  in  Part  I  ([1])  of  this  study,  which  treated 
asymptotically  optimal  sequential  discrimination  for  stationary  observations  characterized  by 
m  -dependent  or  mixing  type  of  dependence,  sequential  tests  based  on  memoryless  nonlineari¬ 
ties  are  employed.  In  particular,  the  sequential  tests  robustified  in  this  paper  employ  linear  test 

_ ft 

statistics  of  the  form  S„  =  A£g(X()  +  Bn ,  where  {X(- }  "=1  is  the  observation  sequence,  the 
_  _  z=i 

coefficients  A  and  B  are  selected  so  that  the  normalized  drifts  of  Sn  are  antipodal  under  the 
two  hypotheses,  and  the  nonlinearity  g  solves  a  linear  integral  equation.  As  shown  in  Part  I, 
the  performance  of  these  tests  is  very  close  to  that  of  the  asymptotically  optimal  memoryless 
sequential  tests  when  the  statistics  of  the  observations  are  known.  The  above  tests  are 
robustified  in  terms  of  the  error  probabilities  and  the  expected  sample  numbers  under  the  two 
hypotheses,  for  statistical  uncertainty  determined  by  2-altemating  capacity  classes  for  the  mar¬ 
ginal  (univariate)  pdfs  and  upper  bounds  on  the  correlation  coefficients  of  time-shifts  of  the 
observations  sequence  for  the  bivariate  pdfs.  Finally,  the  robustification  of  sequential  tests 
based  on  a  test  statistic  similar  to  Sn  defined  above  is  carried  out  for  detecting  a  weak-signal  in 
stationary  m  -dependent  or  mixing  noise  with  uncertainty  in  the  univariate  and  bivariate  pdfs. 
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Systems  Research  Center  at  the  University  of  Maryland,  College  Park,  through  the  National  Science  Foundation’s  Engineering 
Research  Centers  Program:  NSF  CDR  8803012. 
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1.  Introduction 

In  Part  I  of  this  study  [1],  we  addressed  the  problem  of  sequential  discrimination  between 
two  arbitrary  stationary  sequences  of  observations  characterized  by  m  -dependent  or  mixing 
type  conditions.  The  necessary  for  the  development  of  memoryless  sequential  discriminators 
statistics,  namely  the  marginal  and  bivariate  pdfs  of  the  observations,  were  assumed  to  be 
known.  The  discriminators  derived  in  [1]  employed  memoryless  nonlinearities  and  were 
optimal  among  the  class  of  such  structures.  Different  types  of  sequential  tests  employing  linear 
or  quadratic  test  statistics  were  considered  and  a  minimization  of  the  expected  sample  numbers 
of  these  tests  under  the  two  hypotheses  for  fixed  desirable  error  probabilities  was  carried  out 
with  respect  to  the  coefficients  of  the  test  statistics  and  the  nonlinearities.  The  performance  of 
the  various  sequential  tests  and  nonlinearities  derived  was  evaluated  via  simulation  for  several 
situations  of  practical  interest  encountered  in  radar  discrimination  and  involving  envelope 
observations  with  p -mixing  dependence. 

Before  discussing  the  problem  addressed  here  and  the  contributions  made  by  this  paper 
(which  constitutes  Part  II  of  this  study)  we  summarize  the  most  relevant  conclusions  from  Part 
I  (see  [1]).  These  are  the  following: 

(a)  The  sequential  discriminators  employing  a  linear  test  statistic  of  the  form  Sn  =  ATn  +  Bn , 

n 

where  Tn  =  J^gC X|),  [Xt  }"=1  is  the  observation  sequence,  A  and  B  are  coefficients  selected  so 
i=l 

that  the  normalized  drifts  of  Sn  under  the  two  hypotheses  are  antipodal,  and  the  nonlinearity  g 
solves  a  linear  integral  equation,  which  depends  on  the  marginal  and  bivariate  pdfs  of  tire 
observation  sequences  under  the  two  hypotheses  (refer  to  equation  (59)  of  [1]),  perform  only 
slightly  worse  than  the  discriminators  employing  quadratic  tests  and  nonlinearities  g  solving 
the  appropriate  nonlinear  integral  equation  (refer  to  equation  (62)  of  [1]).  The  memoryless 
discriminators  with  linear  test  statistics  are  easy  to  implement;  we  only  need  to  solve  a  linear 
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integral  equation,  which  is  easily  accomplished  via  discretization  and  reduction  to  a  linear  sys¬ 
tem  of  equations,  as  discussed  in  Section  4  of  [1],  to  obtain  the  optimal  nonlinearity,  and  then 
form  the  sum  Tn  and  the  test  statistic  Sn  in  a  straightforward  manner.  In  this  paper,  it  will  be 
established  that  they  are  also  amenable  to  robustification.  The  memoryless  discriminators  with 
quadratic  test  statistics  are  optimal  within  the  class  of  memoryless  sequential  discriminators 
structures,  because  the  quadratic  processing  that  follows  the  formation  of  the  sum  Tn  is  asymp¬ 
totically  optimal,  as  it  corresponds  to  the  likelihood  ratio.  Therefore,  performance  is  comprom¬ 
ised  very  little,  if  one  uses  the  sequential  discriminators  based  on  linear  test  statistics  and  the 
nonlinearity  solving  a  linear  integral  equation. 

(b)  The  sequential  discriminators  described  in  (a)  provide  significant  gains  in  performance 
(meaning  that  they  achieve  the  same  discrimination  reliability  faster,  with  fewer  samples)  when 
compared  to  the  i.i.d.  sequential  discriminators  or  to  the  block  memoryless  discriminators  of 
[2]  (the  companion  paper  to  Parts  I  and  II)  for  identical  desirable  error  probabilities.  This 
justifies  our  recommendation  for  their  use  in  situations  characterized  by  m  -dependent  or  mix¬ 
ing  types  of  dependence  and  our  interest  in  robustifying  them  for  situations  characterized  by 
statistical  uncertainty. 

In  this  part  of  this  study,  we  robustify  the  sequential  discriminators  above  against  uncer¬ 
tainty  in  the  marginal  and  the  bivariate  pdfs.  The  robustification  may  be  necessary  for  many 
situations  of  practical  interest,  in  which  the  statistics  of  the  observations  are  unknown  or  at  best 
partially  known.  The  literature  on  the  subject  contains  a  considerable  amount  of  research  in 
robust  signal  processing  as  attested  by  the  references  cited  in  the  tutorial  of  [3].  However, 
most  of  the  woric  on  robust  detection  is  concerned  with  fixed-sample-size  (or  block)  schemes. 
The  work  of  [4]  constitutes  of  an  exception;  it  considers  robust  detection  of  weak  signals  in 
additive  i.i.d.  noise  for  uncertainty  in  the  noise  pdf  within  p  -point  classes. 
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This  paper  makes  a  twofold  contribution.  On  the  one  hand,  it  robustifes  the  sequential 
discriminators  of  [1],  which  employ  linear  test  statistics  and  nonlinearities  solving  linear 
integral  equations  for  two  arbitrary  stationary  sequences  of  observations  with  m  -dependent  or 
mixing  type  of  dependence  and  uncertainty  in  the  marginal  and  bivariate  pdfs.  Emphasis  is 
placed  on  situations  that  can  not  be  characterized  as  weak  signals  in  additive  dependent  noise. 
As  will  be  shown  in  the  following  section,  the  uncertainty  class  for  the  marginal  pdfs  is  deter¬ 
mined  by  2-altemating  capacities;  this  is  a  very  general  model  that  includes  several  popular 
uncertainty  models  as  subcases.  The  uncertainty  for  the  bivariates  is  determined  by  bounds  on 
the  correlation  coefficients  between  time-shifts  of  the  observation  sequences.  On  the  other 
hand,  this  paper  derives  robust  sequential  detectors  for  weak  signals  in  additive  m  -dependent 
or  mixing  noise.  Here  the  uncertainty  on  the  marginal  pdfs  is  of  the  e-contaminated  or  total 
variation  type,  whereas  the  uncertainty  in  the  bivariate  pdfs  is  determined  by  bounds  on  the 
correlation  coefficients  between  time-shifts  of  the  noise  sequence. 

Consequently,  the  first  part  of  the  paper,  which  is  concerned  with  the  robustification  of 
sequential  memoryless  discrimination  schemes,  extends  naturally  the  work  of  [1]  and  [2]  that 
dealt  with  optimal  memoryless  sequential  and  block  discrimination  schemes,  respectively,  for 
known  observation  statistics.  The  second  part  of  the  paper,  which  is  concerned  with  robust 
sequential  memoryless  schemes  for  the  detection  of  weak  signals,  extends  the  results  of  [5]  and 
[6]  for  the  robust  fixed-sample-size  detection  of  weak  signals  in  additive  dependent  noise  to 
sequential  detection  schemes,  while  at  the  same  time  extending  the  results  of  [4]  for  the 
sequential  detection  of  weak-signals  in  i.i.d.  noise  to  the  case  of  dependent  noise.  Although  [4] 
deals  with  a  detection  problem,  the  weak  signal  is  first  estimated  using  Huber’s  M  -estimators 
and  then  the  estimate  is  used  to  form  a  likelihood  ratio,  on  which  a  probability  ratio  test 
(SPRT)  with  Wald’s  thresholds  is  performed.  In  this  context,  since  combined  estimation  and 
detection  are  used,  the  process  is  somewhat  complicated:  not  only  the  nonlinearity  used  in  the 
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estimator  needs  to  be  derived,  but  also  the  estimate  needs  to  be  computed  from  the  n  observa¬ 
tions  for  each  step  of  the  SPRT,  as  the  latter  progresses.  The  sequential  detector  proposed  in 
this  paper  does  not  rely  on  any  estimation  process;  the  task  is  accomplished  with  a  simpler 
structure.  This  comparison  will  be  elaborated  upon  in  the  following. 

Our  approach  is  that  of  minimax  robustness,  according  to  which  we  derive  sequential 
discrimination  schemes  that  guarantee  a  desirable  level  of  performance  in  terms  of  the  error 
probabilities  (false  alarm  and  miss)  and  the  expected  sample  numbers  under  the  two  hypotheses 
for  the  least-favorable  elements  in  the  uncertainty  classes  (i.e.,  for  marginal  distributions  in 
capacity  classes  and  bivariate  distributions  satisfying  the  bounds  on  the  correlation  coefficients) 
and  show  that,  for  any  other  elements  in  these  classes,  the  performance  of  the  robust  sequential 
schemes  is  superior. 

This  paper  is  organized  as  follows.  In  Sections  2  and  3,  we  develop  and  analyze  robust 
sequential  memoryless  schemes  for  the  cases  of  general  discrimination  from  two  arbitrary  sta¬ 
tionary  dependent  observations  sequences  and  of  the  detection  of  a  weak  signal  in  stationary 
dependent  noise,  respectively.  In  each  of  these  two  Sections,  we  first  introduce  the  necessary 
notation  and  the  uncertainty  classes  for  the  marginal  and  bivariate  pdfs  of  the  observation 
sequences  or  the  noise  sequence;  then  we  derive  expressions  for  the  error  probabilities  and  the 
expected  sample  numbers  under  mismatch  for  the  sequential  test  employed;  finally,  we  derive 
the  robust  sequential  memoryless  discriminators  or  detectors  for  the  uncertainty  calsses  of 
interest.  In  Section  4,  a  summary  of  the  paper  and  conclusions  are  presented. 
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2.  Robust  Sequential  Memoryless  Discriminators 
2.1  Preliminaries 

As  in  [1]  and  [2]  the  general  hypothesis  testing  problem  of  this  paper  is  formulated  as  the 
need  to  discriminate  between  the  two  hypotheses 

Hk:  X  ~f£n)  for  k  =0,1,  (1) 

where  X  =  (X h  X2,  ....  Xn)  denotes  the  vector  of  n  dependent  observations  and  f  kn)(X)  the 
n  -dimensional  joint  pdf  of  X.  For  many  situations  of  practical  interest  characterized  by  depen¬ 
dence  and  non-Gaussian  statistics  /*(n)  is  hard  or  even  impossible  to  obtain  in  closed  form,  as 
density  estimation  in  n  dimensions  can  be  a  truly  formidable  task.  Therefore,  we  resort  to 
models  of  dependence  that  are  as  non-restictive  as  possible  and  at  the  same  time  make  the 
analysis  of  discrimination  schemes  possible. 

In  [1]  and  [2],  various  models  of  dependence  were  reviewed.  Here  we  cite  only  the  most 

relevant  definitions  so  that  we  can  introduce  the  necessary  notation  and  make  the  presentation 

in  this  paper  self-sufficient.  The  simplest  model  of  dependence  assumes  that,  under  both 

hypotheses,  the  observations  are  stationary  and  m  -dependent,  meaning  that  (see  [13])  the  sta- 

/ 

tionary  data  X[  and  are  correlated  with  known  correlation  for  I/-/  I  <  mk  and  independent 
for  1 1 -l  I  >  mk,  under  hypothesis  Hk.  The  least  restrictive  dependence  model  of  interest  is 
that  of  p-mixing  which  is  characterized  by 

covk{X,Y}<  pM  •  (2a) 

for  all  real  X  e  L2(A )  and  Y  e  L2(B).  X  and  Y  are  random  variables  measurable  with 
respect  to  A  and  B ,  respectively,  where  A  is  an  event  from  X\ ,  the  latter  being  the  a- algebra 
generated  by  the  random  variables  [Xx,  X2,  .  .  .  ,  Xc },  and  B  an  event  from  X™n,  which  is  the 
a-algebra  generated  by  the  random  variables  {X[+n ,  Xl+n+x,...}.  pk  n  are  sequences  of  real 
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numbers,  such  that  -»  0,  as  n  for  k  =  0,  1.  Equation  (2a)  implies  the  weaker  but 
more  intuitive  result 

covk{XiJCt+n}  <pk<n  (2b) 

and  represents  a  good  model  for  a  time  series  of  data  that  are  asymptotically  uncorrelated. 

The  main  component  of  the  test  statistic  for  the  discriminators  of  [1]  and  [2]  is  of  the 

ft 

form  Tn(X)  =  £  g (Xx ),  where  the  number  of  samples  n  is  large  (e.g.,  n  »  mk)  and  g  is  a 

;=i 

nonlinearity  chosen  to  maximize  a  suitable  performance  measure.  The  means  of  Tn(X )  under 
the  two  hypothesis  Hx  and  H0  are  given  by  «(Xj  and  ii\Xq,  respectively,  where 

M*  =  Ek  =  \g(x)fk(x)dx,  k  =  0,  1  .  (3) 

Ek  denotes  expectation  under  hypothesis  Hk  and  fk(x)  is  the  corresponding  marginal  density. 
The  asymptotic  variance  of  Tn(X)  under  hypothesis  Hk  is  nakt  where 
Ok  =  lira n-*~n~xvark  {Tn  }  (k= 0,1)  is  given  by 

ol(g )  =  vark  {g  (X  j) }  +  2  £  covk  {g(X  Jg  (XJ+x)} 

y'=i 

=  Ek{g{Xl)1}+2JJEk[g{X{)g{Xj+x))-{2m+\)[Ek{g{Xx)}}1  (4a) 

y'=t 

for  m  -dependent  observations,  and  by 

ol(g)  =  Ek{g{Xxf]  -  [Ek{g{Xx)\f  +  2£  \Ek{g(Xx)g(XJ+x)}  -  [E.UfX,)}]2]  (4b) 

7=1  L 

for  p-mixing  observations. 

hi  addition,  Tn(X )  is  asymptotically  Gaussian  distributed  under  hypothesis  Hk  with 
mean  n  and  variance  n  ok  cited  above,  provided  that  ak  >  0  and  some  other  conditions 
hold.  This  follows  from  the  validity  of  the  Central  Limit  Theorem  (CLT)  for  dependent 
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observations  (see  [13]  and  particularly  the  tutorial  in  [14],  which  provides  CLTs  for  various 
mixing  types  of  stationary  observations).  For  example,  in  the  p-mixing  case,  the  condition  for  a 

oo 

CLT  to  hold  is  that  the  variance  in  (4b)  is  a*  >  0  and  £  pk  2„  <  ~  (see  [12]).  Actually,  the 

n= 0  ’ 

existence  and  validity  of  Central  Limit  Theorems  for  quantities  like  Tn  formed  from  the  depen¬ 
dent  observations  of  (1)  constitutes  the  basis  for  the  remainder  of  this  paper. 

The  sequential  test  (SPRT)  to  be  robustified  is  based  on  the  linear  test  statistic  of  (33a)  of 
[1],  which  employs  the  nonlinearity  solving  the  linear  integral  equation  of  (59)-(60)  of  [1]. 
The  reasons  for  this  choice  of  test  statistic  and  nonlinearity  have  already  been  discussed  in 
Section  1  of  this  paper  and  in  Section  5  of  Part  I  [1].  The  test  statistic  of  interest  is  expressed 
as 


2(P-!~ ftp) 
df-HSo2 


n 


'Zg&i)- 

i=i 


Pi^o+PcA2 

- 5 — i — n 

dl-HSo 


(5) 


and  it  is  compared  to  Wald’s  thresholds 

a  =  ln—^—  <  0 
1-6 

and 


(6a) 


(6b) 


where  6  and  $  are  the  desirable  error  probabilities  for  the  SPRT.  In  (5),  the  means  p*  arid 
variances  6*,  for  k  =0,  1,  are  obtained  from  (3)  and  (4a)-(4b),  respectively,  upon  substitution 
for  the  nonlinearity  g  and  the  marginal  and  bivariate  pdfs  fk  and  [f^+v>}]L  i-  These  pdfs  (i) 
may  represent  estimates  of  the  statistics  that  govern  the  observation  sequences  under  the  two 
hypotheses  ( k  =  0,  1)  and  thus  could  be  different  from  the  actual  statistics  of  the  observations, 
or  (ii)  they  may  be  chosen  to  characterize  the  least-favorable  conditions  for  the  operation  of  the 
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test  of  (5)  within  certain  uncertainty  classes  (as  will  be  done  in  Section  2.2.)  below.  For  nota- 
tional  convenience,  we  use  f^\x,y),  instead  of  fkl^+1\x,y),  heretoforth.  We  assume  that 
these  marginal  and  bivariate  pdfs,  from  which  g  is  determined,  always  exist  and  die 

A  A  /:\  Al|l  A 

corresponding  distributions  (cdfs)  are  denoted  by  Fk  and  FkJ>.  We  denote  by  Fk  the  pair  (Fk, 

{A0)}j= i). 

The  linear  integral  equation  that  g  solves  was  derived  in  [1]  but  is  cited  here  for  the  sake 
of  completeness: 


w  (&,$)/  tCO  +  w($A)fo(x) 


-  jK(x,y)g(y)dy  =  g(x) 


(7) 


where 


w  (&,3)lC  i(x  ,y )  +  w  ((5,&)£0(x  ,y ) 

K(x,y )  = - — - - - 7 - 

w  (<%$)/ i(;0  +  w$,ft)/o(x) 

The  kernels  Kk(x,y)  for  k  =  0,  1  are  defined  as 


(8) 


mt  r 


Kk(x,y)=  X 


fk  \x -k )  +  fk  }(y>x)  -  2 fk (x )fk (y ) 


'.01 


-fk(x)fk(y) 


(9) 


for  m  -dependent  observations;  mk  should  be  replaced  by  00  for  p-mixing  observations.  The 
function  w  (x  ,y )  is  defined  as 


w(x,y)  =  (1  -  x)  In- — -  +  x  In ■  x-  —  (10) 

y  1  ~y 

and,  as;  shown  in  [1]  for  desired  error  probabilities  &  and  0,  >v(Cc,(5)  >  0  and  w  0,&)  >  0. 

The  operating  conditions  of  the  above  test  statistic  are  determined  for  our  analysis,  which 
involves  only  first  and  second  order  statistics,  by  the  actual  marginal  Fk(x)  and  bivariate 
{F^}jL t  distributions  (k  =  0,  1)  of  the  observations,  which  are  generally  different  from  the 
ones  involved  in  (5)  and  (7)-(9).  This  situation  is  called  mismatch  and  plays  an  important  role 
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in  the  robustification  of  the  test  in  (5).  The  above  distributions  may  or  may  not  have  densities; 
when  they  have  pdfs,  these  are  denoted  by  fk  and  We  use  F*k  to  denote  collectively  the 
pair  ( Fk ,  [F^}JL{)  of  the  actual  cdfs  of  the  observations  under  hypothesis  Hk  (k  -  0,  1). 
Clearly,  the  asymptotic  means  of  the  test  statistic  Sn  under  Hk  depend  on  Fk  and  its  asymp¬ 
totic  variances  on  Fk. 

We  now  describe  the  uncertainty  classes,  to  which  belong  the  marginal  and  bivariates 
pdfs  necessary  for  solving  (7)  and  obtaining  g ,  as  well  as  the  pdfs  characterizing  the  operating 
conditions  of  the  test  statistic  in  (5).  The  uncertainty  classes  are  identical  to  those  considered  in 
[2]  for  the  robustification  of  block  memoryless  discriminators.  These  classes  constitute  an 
extension  to  those  considered  in  [5]  and  [6]  for  memoryless  block  discrimination,  in  that  tire 
classes  for  the  marginals  pdfs  treated  in  this  paper  and  in  [l]-[2]  are  broader.  Specifically,  the 
marginal  pdfs  are  assumed  to  belong  to  uncertainty  classes  determined  by  2-altemating  capa¬ 
cities,  also  termed  Huber-Strassen  classes.  These  classes  include  many  popular  models  of 
uncertainty,  such  as  the  e-contaminated  classes  (see  [7]),  the  total  variation  classes  (see  [7]), 
the  band  classes  (see  [8]),  and  the  p  -point  classes  (see  [9]).  These  classes  are  characterized  by 
either  a  degree  of  deviation  from  a  nominal  (known)  pdf  or  by  known  upper  and  lower  bounds 
(confidence  limits)  on  the  members  of  the  class.  They  can  be  considered  as  special  cases  of  a 
general  uncertainty  model  involving  a  capacity  as  the  upper  measure  of  each  specific  class. 
Generalized  capacities  [10]  can  also  be  considered  in  this  context.  The  basic  theory  of  minimax 
robustness  for  these  was  developed  in  [11],  The  least  favorable  elements  with  respect  to  the 
Bayes  risk  of  these  classes  have  been  identified  in  closed  form  for  each  one  of  the  four  uncer¬ 
tainty  classes  enumerated  above.  In  Appendix  A,  we  review  the  most  relevant  to  our  problem 
results  of  this  theory  and  provide  a  complete  example  based  on  the  e-contaminated  model.  We 
assume  that  the  nominal  distributions  determining  the  uncertainty  classes  of  (A-l)  have  densi- 
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ties  (pdfs)  and  so  do  the  least-favorable  distributions  singled  out  by  Lemma  1  of  Appendix  A. 

As  in  [5]  and  [6],  the  bivariate  pdfs  are  assumed  to  belong  to  classes  determined  by 
bounds  on  the  correlation  coefficients  between  time-shifts  of  the  observation  sequence 
(assumed  to  be  p-mixing  in  the  less  restrictive  case),  then 

lcovJt{g(X1)g(X1+;-)}i 

sup - 7~Tv - nt — 7T  -  r*J  0 

where  g  ranges  over  all  measurable  functions  satisfying  Ek[g2(X1)}  <  °°  under  hypothesis  Hk, 
for  k  =0,  1.  Since  we  assume  stationarity,  the  denominator  in  (11)  is  vark  {g(X ])}.  The 
parameters  rkj  can  be  obtained  from  the  parameters  p kj  of  the  p-mixing  process  (A, 
under  Hk.  As  proved  in  Proposition  6  of  [2],  for  processes  {X, }  ”]  with  bivariate  distributions 
having  diagonal  expansions  involving  an  orthonormal  set  of  polynomials,  the  supremum  of  the 
correlation  coefficient  of  the  process  {^(X,)},”!  in  (11)  can  be  directly  related  to  the  correla¬ 
tion  coefficient  and  the  moments  up  to  order  four  of  the  original  process  {X,  }“i.  Examples  of 
processes  with  such  expansions  are  those  with  Gausssian  or  Gamma  distributions  and  processes 
obtained  from  them  via  memoryless  transformations. 

For  a  given  marginal  distribution  Fk,  equality  holds  in  (11)  for  all  g,  if  the  bivariate  dis¬ 
tribution  function  is 

FkJ\x,y )  =  (1  -  rkJ)Fk(x)Fk(y)  +  rkjFk(xAy)  (12a) 

where  xA y  is  the  minimum  of  x  and  y .  If  Fk  has  a  density  fk,  then  we  may  write  for  the 
bivariate  pdf 

/*°W)  =  (1  -  rkJ)fk(x)fk(y)  +  rkJfk(x)5(x-y)  (12b) 

where  8(x)  is  Dirac’s  8  function.  In  [6]  it  was  shown  by  two  constructions  that  there  exist 
processes  with  bivariate  distributions  given  by  (12).  Notice  that,  if  the  condition  (11)  is 
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satisfied,  then  (4a)  and  (4b)  imply 

Ok(g)  —  (1  +2/?*)  vark  {g(Xj)}  (13) 

with  Rk  =  £ rkj  •  where  for  the  m  -dependent  model,  mk  is  the  number  of  samples  it  takes  the 
/'= i 

signal  to  decorrelate  (meaning  that  covk  =  0  for  j  >  mk)  under  hypothesis  Hk\  for 

the  p-mixing  model,  mk  =  The  equality  is  achieved  in  (13)  for  the  cdfs  defined  by  (12)  and 
thus  (12)  has  maximum  variance  among  all  cdfs  in  the  class  defined  by  (11).  In  this  formula¬ 
tion,  it  turns  out  that  the  value  of  the  sum  Rk,  rather  than  the  individual  terms  of  the  sum,  are 
relevant  to  the  robustification  that  follows. 

2.2  Robustification  of  Sequential  Memoryless  Nonlinear  Discriminators 

Before  robustifying  the  performance  measures  of  interest,  we  establish  the  following 
result,  which  provides  the  error  probabilities  and  the  expected  sample  numbers  of  the  sequen¬ 
tial  test  under  mismatch  and  is  used  extensively  in  the  sequel. 


Proposition  1:  Let  Pk(g  fk)  denote  the  probability  of  error  under  mismatch  and  Ek  {N  \g  ,Fk)} 
the  required  average  sample  number,  when  hypothesis  Hk  (k  =  0,  1)  is  true.  Let  us  assume 
that  the  sequential  test  of  (5),  with  thresholds  a  and  b  defined  by  (6a)-(6b)  for  desired  error 
probabilities  &  and  (X  is  employed.  Then  the  following  identities  hold 


Poigf'o)  -  cc  =  Pq{Sn  up-crosses  b  before  it  down-crosses  a  }  =  - 


o) 

d\  v£<gK) 


P\(§  =  p  =  Pi{SN  down-crosses  a  before  it  up-crosses  6}  = 


2goCI£o) 
6  ^  o) 

2gi^£i) 

1V)^ 

2gi(g£i) 

1— (g6-a  -j 


(14a) 


(14b) 


and 
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^o{Aflg^o)}= 

-Pod^o 

E1{N\gJ^l)}  = 

Hldv^l) 

where 


^kigfk) 


2(fli-flo) 

d,2+d02 


V-(gfk)  - 


Al^Q2+ftfl^l2 


and 


A  T7*\  4(Al  AO)  _2/  ~  r-*\ 

*k(gfk)  =  — 5 — 5T°  (S^k) 
(Sf+S0)2 


(15  a) 


(15b) 


(16) 


(17) 


for  k  =  0,  1.  In  (16)-(17),  |i(£ >Fk)  =  f g(x)dFk(x)  =  lim  n  xEk{Tn)  denotes  the  asymptotic 

**  n  -*<» 

mean  and  a2(g  ,Fk)  =  lim  n~xvark{fn }  the  asymptotic  variance  [obtained  from  (4a)  or  (41b) 

rt— »oo 

n 

for  g  and  marginal/bivariate  pair  Fk\  under  mismatch  of  fn  =  £|(X,),  when  hypothesis  Hk  is 

i=l 

true.  The  corresponding  means  and  variances  under  matched  conditions  are  denoted  by 
Ai  =  M-C?  ,Ek)  =  \k(g  ,fk)  and  6k  =  o2^  ,F*k)  =  In  this  case,  the  pdfs  are  assumed  to 

exist  and  have  already  been  used  in  the  definition  of  the  test  statistic  in  (5).  The  rest  of  the 
quantities  involved  in  (14)-(15)  are  the  error  probabilities  under  matched  conditions 

6.--=P0{Sn  >b}  (18a) 

and 

f>--=Px{SN  <d)  (18b) 


and  the  quantity 


oo(x,y  ;x)  =  (1  -  x)  In- — —  +  x  In — - — 

y  i  -y 


(19) 


which,  under  matched  conditions,  reduces  to  ©(x,y  ;x)  =  w(x,y)  defined  by  (10). 
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Remark  1:  Equations  (14a)-(14b)  and  (15a)-(15b)  are  approximations  which  become  tight, 
when  the  desired  probabilities  of  false  alanm  and  miss  are  sufficiently  small,  so  that  a  large 
number  of  samples  is  required  to  achieve  the  desired  reliability.  If  N  (the  required  sample 
number)  is  large  under  both  hypotheses,  then  (i)  the  overshoot  phenomenon  present  in  Wald’s 
approximations  can  be  neglected  and  (ii)  the  diffusion  (Brownian  motion)  approximation  used 
in  the  computation  of  the  means  and  variances  of  the  test  statistic  becomes  accurate. 

Remark  2:  The  numerators  of  (15a)-(15b)  are  usually  positive  for  all  situations  of  interest  and 
so  are  the  denominators.  These  facts  are  established  in  Proposition  3  below. 

Proof:  Expressions  (14a)-(14b)  were  obtained  following  similar  steps,  as  for  deriving  (25a)- 
(25b)  of  [1]  (Part  I),  essentially  by  applying  the  Brownian  motion  approximation  to  the  linear 
test  statistic  of  (5)  operating  under  mismatch  conditions.  The  expressions  in  (16)  and  (17) 
actually  represent  the  drift  and  the  variance  of  the  diffusion,  i.e.,  for  large  n 

Ek{Sn}  =  ny:k(gfk) 
and 

vark  {5„  )  &  n<5k(g  . 

Furthermore,  the  expressions  (15a)-(15b)  can  be  obtained  in  a  manner  similar  to  that  used 
for  deriving  equations  (15a)-(15b)  and  (18a)-(18b)  of  [1].  Specifically,  by  neglecting  tine 
overshoot  phenomenon  and  using  Wald’s  approximations  we  obtain  that  under  mismatch 

£0{5w}  =  -<B((i$;a) 
and 

EASN}  =  <b(M;P). 


Moreover,  we  can  easily  show  that,  for  N  taking  large  values, 
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Ek[SN)=]Zk(g,fk)Ek{N) 

Then  (15a)-(15b)  follows  from  a  combination  of  the  above  equations. 

Next  we  establish  the  first  of  the  two  main  results  of  this  section  pertaining  to  the  error 
probabilities  of  the  sequential  test  of  (5). 

Proposition  2:  Consider  the  sequential  test  which  is  based  on  the  thresholds  a  and  b ,  obtained 
from  (6a)-(6b)  for  desired  error  probabilities  &  and  $,  and  on  the  linear  test  statistic  of  (5), 
which  employs  the  nonlinearity  g  solving  (7)-(8)  with  the  kernels  of  (9)  modified  according  to 
(12b)  as 

Kk(x,y)  =  2Rkfk{x)h{x-y)  -  (l+2Rk)fk(x)fk(y)  (20) 

where  Fk  (fk)  are  the  cdfs  (pdfs)  singled  out  by  Lemma  1  of  Appendix  A  for  the  capacity 
class  of  (A-l),  and  Fk]  satisfy  (12a),  for  all  j  and  k  =  0,  1.  Then  this  test  is  least-favorable 
for  the  error  probabilities  under  the  two  hypotheses,  that  is, 

Pk(g,F*k)<Pk(g,F*k)  (21) 

for  any  Fk  =  (Fk,{Fk^}JL\)  with  Fk  in  the  capacity  class  Fk  of  (A-l)  and  F^  satisfying  (11). 
In  (21),  Pk(g,Fk),  the  error  probabilities  under  mismatch,  are  as  defined  in  Proposition  1  by 
(14a)-(14b). 

Remark  3:  This  Proposition  holds  under  the  same  conditions  that  Proposition  1.  The  issues 
raised  by  Remark  1  about  the  accuracy  of  Wald’s  approximations  and  the  Brownian-motion 
(diffusion)  approximation  are  also  valid  here.  (21)  can  be  expressed  as  a  <  &  and  (3  <  $  in  the 
notation  of  Proposition  1.  The  result  in  (21)  is  valid  under  the  assumptions  (A4)  stated  in 
Remade  4  below. 

Proof:  Proving  (21)  requires  several  steps.  We  start  by  using  the  fact  that,  since  FkJ)  satisfies 
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(12a),  the  equality  in  (13)  is  achieved,  and  we  can  write 

A  ----  Ag  A)  =  (1  +  2 Rk)oz(g ,Fk)  ■  (22) 

The  quantity  o2(g,Fk)  defined  after  (17)  (for  matched  conditions)  now  depends  only  on  the 
marginal  cdf  Fk\  this  is  equivalent  to  removing  the  dependence  from  the  observations  sequence 
and  modifying  the  kernels  of  (9)  according  to  (12b),  so  that  they  are  given  by  (20).  We  return 
to  this  important  point  later  in  this  proof.  We  use  (22)  to  define 

Pk(gA)  =  Pk(.gA)  (23) 

where  Pk(g,Fk),  for  k  -  0,  1,  is  obtained  from  (14a)-(14b)  by  using  (22)  in  (17)  with  Fk 
replacing  Fk .  The  left-hand-side  in  (23)  depends  only  on  the  univariate  (marginal)  distribution 
Fk.  Furthermore,  because  of  (13) 

*  (1  +  2Rk)v2(g,Fk)  (24) 

for  any  Fk  with  bivariates  satisfying  (11),  for  j  =  1,  2,  •  •  •  ,  and  arbitrary  marginals  Fk. 
Therefore,  since  Pk(gF k)  is  an  increasing  function  of  Ad  A)  given  by  (17)  and  the  latter  is 
an  increasing  function  of  o2(g  ,Fk),  which  is  the  left-hand-side  of  (24),  we  obtain 

Pk(g  A)  -  Pk(g  >Fk)  .  (25) 

In  (25),  Pk(gJFk),  for  k  =  0,1,  can  be  obtained  from  (14a)-(14b)  by  using  the  right-hand-side 
member  of  (24),  for  AgA)  in  (17).  The  right-hand-side  of  (25)  now  depends  only  on  the 
marginal  cdf  Fk . 

Upon  substitution  from  (23)  and  (25)  in  (21),  we  find  that  (21)  is  valid,  if  the  following 
inequality  holds 

Pk(g,Fk)<Pk(g,Fk)  (26> 

for  all  the  marginal  cdfs  Fk  in  the  class  Fk  given  by  (A-l)  and  Fk  singled  out  by  Lemma  1  of 
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Appendix  A.  This  inequality  corresponds  to  the  least-favorability  condition  for  the  new  error 
probabilities  Pk(g,Fk)  obtained  from  the  original  ones  Pk(g ,Fk)  by  removing  the  dependence 
of  the  observations  sequence  through  the  use  of  the  bounds  of  (11). 

To  prove  the  inequality  in  (26),  we  use  some  of  the  results  on  minimax  robustness 
reviewed  in  Appendix  A.  Several  steps  are  involved  in  the  proof.  First  we  exploit  the  fact 
that  the  mismatch  error  probability  of  (14a)  is  an  increasing  function  of  the  normalized  drift  c0, 
whereas;  (14b)  is  a  decreasing  function  of  the  drift  cx,  defined  as  ck  =  2]Lk(g  ,Fk)  /  &k(gfk) 
for  k  =0,  1,  and,  for  the  worst-case,  as  ck  =  2\Lk (g  fk)  I  ok(g  fk).  Removing  the  depen¬ 
dence  in  the  observations  through  the  bounds  of  (11)  implies  that  cQ  =  -1  and  <?j  =  1.  Using 
this  and  the  definitions  (16)-(17)  we  establish  that  the  inequalities  of  (26),  for  k  -  0,  1,  are 
equivalent  to  the  following  inequalities  characterizing  the  mismatch  and  matched  worst-case 
situations: 

W  ,Fo)-qO?  ,Ao)](1+2/?  Ml  /,)+[p(|  ,F0h]x(g  ,F  ,)](l+2/?  „)o2(|  /„)  ^  ,  A 

c0  - - x - x - - - <  -1  =  c0(27a) 

[\i(£  f  x)-fg  ,F  0)](\+2R  0)G2(g  f  q) 

and 

[^^(g/’1)-p(g/0](l+2Ro)^^Fo)+[q(g^1)-q(^/o)](l+2Rl)o2(^7?1)  ^  ,  A 

c  = - - - - - - - >  1  =  Cj  (27b) 

[\l(g  f  ,Hl(g  ,F0m+2R  i W(g  f  i) 

In  (27a)-(27b),  the  means  and  variances  involved  depend  only  on  the  marginal  cdfs  Fk  and  Fk, 
as  has  already  been  shown  above.  In  particular,  both  the  matched  worst-case  variances  6*  and 
the  mismatch  variances  csk,  involve  the  same  factor  (1  +  2 Rk)  after  the  dependence  in  the 
observations  is  removed  in  the  expressions  for  Pk(gf*k)  and  subsequently  in  the  expressions 
for  ck.  Under  the  assumption  that 

pti/j)  >  p(s/0)  (Al) 


the  conditions 
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M-d/’o)  <  n(g/o)  and  a2(g  ,F  0)  >  a2(g  f  0)  (28a) 

are  sufficient  for  (27a).  To  prove  this  notice  that,  if  (28a)  holds,  then 

[pd  ,F0)-pd  /o)Kl+2/?  ^(g  /  j)+[pd  JF0y\L(g  /  !)](l+2/?  o )o2(|  F 0) 

<  [pd  ,F0)-pd  ,F  i)](l+2/? 0)o2(g  ,F0)  <  [ii(gS0hii(g  ,F  ,)](1+2F  0)o2d  F0) 

=  -[pd  F  t)-pd  ,F0)](  1+2/?  0)<?(g /o)  <  -[pd  ,F  jHl(i /0)](1+2R  o)o2(g  F0) 


and  thus  c0  <  -1  =  c0.  Similarly,  the  conditions 

pd.F^pd/j)  and  a2d/1)>a2d/’1)  (28b) 

are  sufficient  for  (27b).  To  prove  this  notice  that,  if  (28b)  holds,  then 

[pd  F  i)-|id  F  i)l(l+2F  0)o2d /o)+[hd  F  i  Hid /o)10+2/?  ,)o2d  F  j) 

>  [pd  ,F  ,)-pd  ,F  01(1+2/?  0)o2d  ,F0)+[pd  /  iHitf  /o)1(1+2F  Ml  / 1) 

>  fpd  F  t)-pd  /o)l(l+2F  Ml  F  0  >  [pd/1)-pd,F0)l(l+2F1)a2d,F1) 


and  thus  cx  >  1  =  £x. 

Now  we  show  that  conditions  (28a)-(28b)  are  satisfied  for  the  g  that  solves  the  linear 
integral  equation  (7),  after  the  removal  of  dependence  in  the  observations  through  the  bounds 
of  (11).  As  already  discussed,  we  substitute  //7)  from  (12b)  for  the  fF  in  (9)  to  obtain  the 
kernels  Kk  in  (20);  after  some  further  manipulations,  the  linear  integral  equation  (7)  becomes 


_ fi(x)-f0(x) _ 

w(6,^)(l+2F1)/1d)  +  w(M)(1+2/?0)/oO) 


IC*)-J 


w(&g)(l+2fl1)/1Qc)/1(y)  +  w(3,(2)(l+2Fo)/o(x)/0(y) 

w(a,$)(l+2R0fm  +  w(fra)(l+2F0)/oC*) 


g(y)dy 


or,  equivalently,  since  g  scales  both  members  of  the  integral  equation, 


f\(x)-fo(x) 

Af\(x)+f0(x) 


I 


4/i(*)/i(y)  +  /o(*)/o(y) 
Afi(x)  +  fo(x) 


g(y)dy 


(29) 
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where 


w  (&,$)(!  +  2R  t) 
w(M)(l  +  2i?o)  ' 


An  integral  equation  of  the  form  (29)  was  solved  in  [6,  Appendix  C]  for  a  different  A ,  The 
solution  to  (30)  is  shown  to  be 


g  00  = 


/  iCO 

Af\(x)  +f0(x) 


Anv(x)  +  1 


(31) 


for  all  x  €  £1  (the  sample  space),  where  kv(x)  =  f  \(x)/f0(x)  >  0  is  the  Huber- Strassen  deriva¬ 
tive  defined  in  Lemma  1  of  Appendix  A  of  this  paper. 

In  Appendix  B,  we  use  the  dominance  properties  (A-3)-(A-4)  of  Lemma  1  of  Appendix 
A  to  prove  that  the  sufficient  conditions  for  the  minimax  robustness  of  the  error  probabilities  in 
(26)  [namely  (28a)  and  (28b)]  are  satisfied  for  the  nonlinearity  g  given  by  (31). 


Remark  4:  Sufficient  conditions  for  robustification  are  the  assumptions  (A2)  and  (A3)  of 
Appendix  B,  which  can  be  summarized  as 

M-(l  >^o)  -  0  <  |x(g  F  t)  (A4-1) 

where  the  two  equalities  are  not  allowed  to  hold  simultaneously  (resulting  in 
^(g/i)  ~  Li(gJo)  >  0)  and 

A|x(^/!)>1.  (A4-2) 

These  conditions  are  not  particularly  restrictive  for  most  practical  situations.  Specifically,  A 
[given  by  (30)]  is  typically  a  relatively  large  positive  number  (as  is  the  case  for  the  realistic 
discrimination  scenaria  considered  in  Section  4  of  [l]),  because,  under  Hi,  the  observations  are 
usually  more  strongly  positively  correlated  than  under  H0\  this  implies  that  (A4-2)  is  easily 
satisfied.  Furthermore,  (A4-1)  is  satisfied  in  most  situations  in  which  a  good  choice  of  g  has 
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been  made;  it  represents  a  good  condition  for  adequate  separation  of  the  means  \L(gE^k) 
( k  =  0,  1)  under  the  two  hypotheses  and,  consequently,  for  a  better  performance  of  the  sequen¬ 
tial  test  of  (5). 

Finally,  we  prove  the  second  main  result  of  this  section,  which  pertains  to  the 
robustification  of  the  expected  sample  numbers  of  the  sequential  memoryless  test  of  (5). 


Proposition  3:  Suppose  that  the  same  sequential  test  as  in  Proposition  2  is  employed.  A  nota¬ 
tion  identical  to  that  of  Propositions  1  and  2  is  used.  Assume  that,  besides  the  assumptions 
(A4)  of  Remark  4,  the  following  additional  assumptions  hold: 

m-Lrl^a-aXi-p)  (A5-1) 

&  <5$ 

(A5.2) 

&  60 

Then  the  sequential  test  of  (5)  is  minimax  robust  for  the  expected  sample  numbers  under 
the  two  hypotheses,  that  is, 

Ek  {AM gJFk)<  Ek  [N  I g ,Fk }  ,  for  k  =  0,  1  ,  (32a) 

and 


EX{N \gEx)  +  E0{N  \gf0}  <  EX{N \gSx}  +  E0{N\g,F0 )  <  EX{N  ig.Fi)  +  E0{N  l^o) 

(32b) 

for  all  marginal  cdfs  Fk  in  the  capacity  class  Fk  of  (A-l)  with  bivariates  F^  satisfying  (11) 
and  any  measurable  function  g  satisfying  Ek  {g2(X  j))  <<=<=.  Fk  (fk)  is  the  cdf  (pdf) 'singled  out 
by  Lemma  1  of  Appendix  A  with  bivariates  Fk^  satisfying  (12a),  for  all  j.  The  expected  sam¬ 
ple  numbers  under  mismatch  Ek  {N  I  g  ,Fk }  are  as  defined  by  (15a)-(15b)  of  Proposition  1. 


Remark  5:  The  assumptions  (A5)  are  not  so  restrictive,  since  they  can  be  easily  satisfied,  if 
both  &  and  0  (the  desirable  error  probabilities  under  worst-case  conditions)  are  smaller  than 
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10~3.  Therefore,  our  results  are  valid  for  sequential  tests  with  high  reliability;  recall  that, 
according  to  Remarks  1  and  3,  the  sufficiently  small  probabilities  of  false  alarm  and  miss  are 
necessary  for  Wald’s  approximations  and  the  diffusion  approximation  used  in  Proposition  1  to 
be  accurate. 

Remark  6:  The  choice  of  the  linear  test  statistic  in  (5)  and  of  g  solving  the  linear  integral 
equation  of  (29)  restricts  the  validity  of  the  right-hand-side  inequality  in  (32b)  to  the  classes  of 
sequential  tests  employing  linear  test  statistics  and  solving  integral  equations.  However,  as 
already  discussed  at  the  beginning  of  Section  4,  these  choices  are  well  justified  by  practical 
considerations. 


Remark  7:  The  inequalities  in  (32)  are  not  inequalities  in  the  strict  sense;  this  becomes  clear 
in  the  proof  that  follows  immediately  below  and  is  related  to  the  assumptions  (A5-1)  and  (A5- 
2).  However,  these  inequalities  are  satisfied,  for  all  practical  puiposes,  when  &  and  $  are 
sufficiently  small  (refer  to  Remark  5).  In  particular,  as  &  -»  0  and  0  —»  0,  the  required 
number  of  samples  IV  — >  under  both  hypotheses,  the  thresholds  a  and  b  obtained  from 
(6a)-(6b)  for  the  desirable  error  probabilities  <S  and  $  become  a  — >  -  »  and  b  — »  °°,  and  (32) 
reduces  to  the  asymptotic  result 


E0{N„\gf  o} 


EiiNJi.Fi) 


— — -  <E0{N„\gf0} 

-  Pod  -^o) 


-  a 

-Pbd  Ai) 


6 

Pid^i) 


<E,{N„\gf,} 


b 

Pid/i) 


(33a) 

(33b) 


+E0{Noo\g,F0}  £El[N„\§f1}+E0{N'.l§f0}  <  EdNJg/i)  +  EQ{NJg/0} 

(33c) 


The  quantities  involved  in  these  inequalities  are  termed  asymptotic  speeds  of  the  SPRT 
(matched  and  mismatched  ones),  the  asymptotic  nature  being  denoted  by  the  subscript  =■=  of  the 
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required  sample  number  N. 

Remark  8:  As  promised  in  Remark  2,  the  numerators  of  (15a)-(15b)  are  positive  and  so  are 
the  denominators  for  the  situations  of  interest  in  this  paper.  This  definitely  holds  for  the  robust 
sequential  test  of  (5)  and  observations  with  marginal  cdfs  within  the  capacity  classes  of  (A-l) 
and  bivariates  satisfying  (11),  and  is  established  in  the  proof  below. 

Proof:  The  right-hand-side  in  the  inequality  (32b)  follows  from  the  fact  that  g  is  selected  to 
optimize  the  sum  of  the  average  sample  numbers  under  the  two  hypotheses  (for  desirable  error 
probabilities  smaller  than  &  and  $)  of  the  sequential  test  (5),  when  the  cdfs  are  Fk  (k  =  0,  1) 
and  the  dependence  of  the  observations  has  been  removed  through  the  bounds  of  (11)  and  (20). 
Under  these  conditions,  g  solves  (29),  which  is  a  version  of  the  linear  integral  equation  of  (7). 
In  this  context,  g  is  the  optimal  such  nonlinearity  for  a  sequential  test  employing  a  linear  test 
statistic  and  for  solving  a  linear  integral  equation.  This  optimization  was  discussed  in  detail  in 
Section  2  of  Part  I  of  this  study  (see  [1]). 

The  left-hand-side  inequality  in  (32a)  is  established  as  follows.  We  prove  the  result  for 
k  =  1;  a  similar  proof  holds  for  k  =  0,  as  well.  From  (15b)  and  the  definition  of  co(x,y  \x)  in 
(19)  we  can  rewrite  the  left-hand-side  of  (32a),  for  k  =  1,  in  the  equivalent  form 

E  ,N  |«r  >  =  ln[(l  -  fly&]  -  p  Ml  -  ft)(l  -  &)/(&&)]  s  In[(l  -  $)/&] 

Pid,Fi)  mfi) 

<  in[(i  -  fly&i  s  mi  -  0)/&i  -  fl  MU  -  m  -  0)/(&3)i  ...  E  [m*  pi}  (34) 

Ptd/i)  Ptd/i)  1 

In  proving  (34),  we  first  establish  that  the  the  numerators  of  (15a)-(15b)  are  positive  for  all  ele¬ 
ments  in  the  uncertainty  classes  considered  in  this  paper,  as  predicted  in  Remarks  8  and  2.  We 
use  the  facts  that  a  <  &  and  (3  <  $,  as  established  in  Proposition  2  [refer  to  inequality  (21)  and 
Remark  3],  and  that  a  +  (3  <  1  to  show  that 
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co(|5,&;|3)  >  co($,(S;0)  =  w(|5,&)  >  0  (35a) 

and 

co(&$;a)  >  ©(&$;&)  =  w(ft,0)  >  0  (35b) 

where  l±ie  final  inequalities  come  from  the  discussion  following  (10).  Moreover,  under  assump¬ 
tion  (A4-1)  and,  as  (28a)-(28b)  hold  for  all  Fk  in  the  capacity  class  of  (A-l),  we  have  that 

rr«P\STr/»^ 

VM  f  i)  *  m  f  t)  =  -7-2-7  .  2-,  2  >  0  (36a) 

(Oi  +  do) 

and 

m  ^0)  s  m /0)=-  2((L\~  M*°  <  0  (36b) 

(di2  +  do2)2 

where  d*  =  (1  +  2Rk)o2(g  ,f  k)  and  fl*  =  |J.(g,/*)>  for  k  =0,  1,  as  defined  in  the  proof  of  Pro¬ 
position  2.  This  establishes  that  the  denominators  in  (15a)-(15b)  are  strictly  positive,  for  all 
elements  in  the  uncertainty  classes  of  interest. 

Returning  to  the  proof  of  (34),  we  use  (21),  for  k  =  0,  1,  which  is  equivalent  to  (3  <  (5, 
the  fact  that  condition  (A5-1)  holds  for  $  sufficiently  small,  and  (36a)  to  obtain  the  two 
approximations  in  (34).  Then  the  inequality  in  (34)  follows  from  (36a),  since  all  terms 
involved  are  positive.  Thus,  although  the  initial  numerators  of  (34)  satisfy  the  opposite  ine¬ 
quality  from  the  desirable  one  [see  (35a)],  assumption  (A5-1)  and  the  correct  inequality 
satisfied  by  the  denominators  [see  (36a)]  prevail  to  render  the  desirable  inequality  in  the  middle 
of  (34)  and  thus  the  left-hand-side  of  (32a).  The  left-hand-side  of  (32b)  follows  trivially. 
Notice  that,  for  the  asymptotic  results  of  Remade  7,  the  numerators  of  (33a)-(33b)  are  striclty 
positive  and  the  denominators  satisfy  the  correct  inequalities,  so  that  the  inequalities  in  (33a)- 
(33b)  are  strict  and  do  not  require  the  assumptions  (A5),  which  are  trivially  satisfied  when 
&  — >  0  and  $  0. 


23 


3.  Robust  Sequential  Memoryless  Detectors  for  Weak  Signals 
3.1  Preliminaries 

In  this  section,  we  consider  the  following  special  case  of  (1)  pertaining  to  the  detection  of 
a  weak  signal  in  dependent  non-Gaussian  noise:  we  must  decide  between  the  two  hypotheses 

Hq:  Xt  =  Ni  ,  for  i  =  1,  2,  n  (37a) 

and 

Hx:  Xi  =  +  0  ,  for  i  =  1,  2,  n,  (37b) 

where  {7/; }"=1  is  the  noise  sequence  assumed  to  be  m  -dependent  or  p-mixing  and  0  is  a 
known  weak  signal,  i.e.,  0  — »  0. 

As  in  [5],  the  stationary  noise  sequence  has  a  symmetric  marginal  pdf 

f(x)  =f(-x)  belonging  to  an  e-contaminated  uncertainty  class  (see  [7]): 

/  00  =  (1  -  £)f°(.x)  +  e/00  (38) 

where  f°(x)  is  a  known  symmetric  pdf  (termed  nominal),  e  (0  <  £  <  1)  the  known  degree  of 
uncertainty,  and  f  (x)  an  arbitrary  symmetric  pdf. 

The  following  conditions  are  assumed  to  hold  about  the  nonlinearity  g  and  the  marginal 
pdf  of  the  noise  / : 

g(-x)  =  -g(x),  (39) 

-|j-[Js(*)/(*  -  0)*]lw  =  ’  (40a) 

lim  jg(x)f'(x  -  0/ )dx  =  jg(x)f'(x)dx  (40b) 

for  0/  0  as  /  ->  «>, 

UmE{[g(.Nl  +  t)-g(N1)]2}  =  0, 
t-> o 


(40c) 
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jg(x)f'(x)dx  <  0  (41) 

and 

(g)  =  E  (fg((Vi)]2}  +2 ZE  {g(N ])g(Nj+{)}  >  0  (42) 

j= i 

for  a  p-mixing  stationary  noise  sequence;  the  °°  in  the  sum  of  (42)  should  be  replaced  by  the 
parameter  m  for  an  m  -dependent  stationary  noise  sequence. 

The  bivariate  pdf  of  the  noise  sequence,  denoted  by  f^\x,y)  for  the  pair  (TV  1^Vjr+}), 
satisfies  an  inequality  similar  to  (11)  (see  [6]),  for  k  =  0,  that  is, 

l™v{g(lV1)g(lVi+/)}l  \E[g(N{)g(Nl+j)}\  ^ 

/  var{g(N{)}mr{g(Nl+j)}  /  £{[g(//1)]  }  1 

where  g  ranges  over  all  measurable  functions  satisfying  E  {[gOV^)]2}  <  «>.  The  parameters  r, 
can  be  obtained  from  the  parameters  p j  of  the  p-mixing  process  {N; } “j  (refer  to  the  discus¬ 
sion  following  (11)  in  Section  2.1  for  more  details).  We  denote  by  /*  the  collection 
(f,{fQ))r- 1>-  For  a  given  marginal  pdf  / ,  equality  holds  in  (43)  for  all  g ,  if  the  bivariate  pdf 
takes  the  form 

/°:W)  =  (1  -  rj)f  (x)f  (y)  +  rjf(x)5(x-y)  (44) 

where  5(x)  is  Dirac’s  8  function.  In  [6]  random  processes  with  pdfs  of  the  form  (44) 

have  teen  constructed.  Similarly  to  (13),  if  the  condition  (43)  is  satisfied, 

°o0>)  -  (1  +  2R)E  {[g(N !)]2}  (45) 

oo 

where  R  =  for  a  p-mixing  noise  sequence;  °°  should  be  replaced  by  m  for  an  m- 
7=1 

dependent  noise  sequence.  The  equality  in  (45)  is  satisfied  for  all  g,  if is  given  by  (44); 
thus,  the  pdf  of  (44)  has  maximum  variance  among  all  pdfs  in  the  class  defined  by  (11). 
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As  discussed  in  [4],  when  defining  the  sequence  8/  =  c/fl  — >  0  as  /  «,  we  use  an 

index  other  than  n  (the  sample  number)  for  the  sequential  detection  problem;  thus,  /  — >  °° 
implies  0;  0  and  the  required  sample  number  for  the  SPRT  N  -»  °°,  as  well. 

The  sequential  test  (SPRT)  to  be  robustified  is  based  on  the  linear  test  statistic  of  (5)  and 
the  thresholds  (6a)-(6b)  of  Section  2.1  with  the  necessary  adjustments  for  the  weak-signal  in 
additive  noise  case.  Actually,  as  proved  in  Section  2  of  Part  I  (see  equation  (12)  of  [1]  and  the 
subsequent  discussion)  the  modified  test  statistic: 


»  (lfl 

l<=i  2 


(46) 


is  optimal  within  the  class  of  SPRTs  based  on  memoryless  nonlinearities,  in  the  sense  that  it  is 

n 

a  likelihood  ratio  sequential  test  performed  on  £g(X,).  To  minimize  the  expected  sample 

i=l 

numbers  under  the  two  hypotheses  for  desirable  error  probabilities  &  and  (5,  the  nonlinearity  g 
in  (46)  must  maximize  the  asymptotic  relative  efficiency  (ARE)  [j|  (x)f'(x  )dx  ]2/[6o(! )1  (refer 
to  equation  (22)  of  [1]).  Therefore,  g  is  the  solution  to  the  linear  integral  equation 

§(x)  =  -  -  \K(x,y)g(y)dy  (47) 

fix) 

where  the  kernel  K(x,y)  is  defined  by 

K  ix  ,y )  =  £  | ,y )  +  fU)(y  * )]  (48) 

7=1 

for  a  p-mixing  sequence;  should  be  replaced  by  m  for  an  m  -dependent  noise  sequence.  In 
(47)  and  (48)  f  (x)  and  [f^\x,y)}J°=i  denote  marginal  and  bivariate  pdfs  of  the  noise  sequence 
that  either  represent  estimates  of  the  statistics  of  the  noise  equence  and  thus  are  different  from 
the  actual  statistics  of  the  noise  sequence,  or  may  be  chosen  to  characterize  the  least-favorable 
conditions  for  the  operation  of  the  test  of  (46)  within  specific  uncertainty  classes  [like  those  of 
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(38)  and  (42)]  as  will  be  done  in  Section  3.2.  below.  For  notational  convenience,  we  use 
f*  =  Cf  .(/(/)};“i)-  The  above  situation  is  clearly  one  of  mismatch,  since  the  operating  condi¬ 
tions  of  the  above  test  statistic  are  determined  by  the  actual  statistics  of  the  noise  sequence, 
namely  the  marginal  pdfs  /  in  the  class  (38)  and  the  bivariate  pdfs  {f(j)}jL\  in  the  class  (42), 
which  are  generally  different  form  the  ones  involved  in  (46)  and  (47)-(48).  Finally,  in  (46)  the 
mean  (L0  is  given  by 

P-e  =  jg(x)f(x  -  B)dx  *  0  (49) 

whereas  the  mean  p.()  =  jg(x)f(x)dx  =  0  since  |  and  /  are  odd  and  even  functions,  respec¬ 
tively.  This  last  fact  justifies  why  jIq  is  not  present  in  (46),  which  was  directly  derived  from 
(5).  The  variance  6q  is  obtained  from  (42)  upon  substitution  for  g  and  /* . 

3.2  Robustification  of  Sequential  Memoryless  Detectors  for  Weak  Signals 

First,  we  evaluate  the  error  probabilities  and  the  expected  sample  numbers  of  the  sequen¬ 
tial  test  of  (46)  under  mismatch. 


Proposition  4:  Let  Pk(g,f*)  denote  the  probability  of  error  under  mismatch  and  Ek{N  \ g ,f)} 
the  required  average  sample  number,  when  hypothesis  Hk  (k  =  0,  1)  of  (37a)  or  (37b)  is  true. 
Let  us  assume  that  the  sequential  test  of  (46)  with  thresholds  a  and  b  defined  by  (6a)-(6b)  for 
desired  error  probabilities  6.  and  (5,  is  employed.  Then  the  following  identities  hold,  under  the 
assumptions  (39)-(42): 


P o(£  J7o)  =  o.  =  P 0{SN  up-crosses  b  before  it  down-crosses  a }  = 


2Po  (£J) 
2Po(gV) 

| _ ^gd—b  <f  ) 


(50a) 
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Py(g  jF*)  =  P  =  PX{SN  down-crosses  a  before  it  up-crosses  b }  = 


gFMg/) 

2Pi(gV) 

<ToC?V) 


(50b) 


l-fe6-*) 


and 


£o(Wlg,/)} 


&>(&,0;oP 

-Pod/) 


where 


oXMl£) 

Prd/) 


As 

Pi  (g,n  =  ~ 

60 


Ped  / )  -  Y 


Pod/)  =  - 


As 

26q 


and 


(51a) 


(51b) 


(52a) 


(52b) 


o02d/")  =  ~o2d/*)  .  (53) 

<5o 

In  (52}-(53),  Pad  / )  =  fg  (*)/(*  “  9)Pr  =  lim  n~lE  [fn }  denotes  the  asymptotic  mean  and 

J  n—^o 

a2d / k )  =  lim  n~lE  {[fn ]2}  denotes  the  asymptotic  variance  [obtained  from  (42)  for  £  and 

n  — >oo 

rc 

marginal/bivariate  pair  /  *]  under  mismatch  of  f„  =  ^|(X,).  The  corresponding  means  and 

i=l 

variances  under  matched  conditions  are  denoted  by  p9  =  | led/)  ^  6o  =  cr(g  /*)•  The  rest 
of  the  quantities  involved  in  (50)-(51)  are  ft  and  0,  the  error  probabilities  under  matched  condi¬ 
tions  still  given  by  (18a)-(18b),  and  the  quantity  (0(x,y  ■rx)  defined  by  (19). 


Remark  9:  Remarks  similar  to  Remarks  1  and  2  made  for  Proposition  1  are  valid  here.  Also, 
the  proof  of  Proposition  4  is  very  similar  to  that  of  Proposition  1  and  is  omitted. 
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Next  we  establish  the  first  of  the  two  main  results  of  this  section  pertaining  to  the  robust¬ 
ness  of  the  sequential  test  of  (46)  with  respect  to  the  error  probabilities. 

Proposition  5:  Consider  a  sequential  test  based  on  the  thresholds  a  and  b,  obtained  from 
(6a)-(6b)  for  desired  error  probabilities  Cl  and  $,  and  on  the  linear  test  statistic  of  (46),  where 
the  nonlinearity  g  solves  (47)  with  the  kernels  of  (48)  modified  according  to  (44)  as 

K(x,y)  =  2Rf(x)5(x-y)  (54) 

and  is  given  by 

g(x)  =  -f'(x)/f(x) .  (55) 

In  (54)  /  is  the  least-favorable  pdf  for  the  ARE  and  the  mean-square  estimation  error  for 
uncertainty  in  the  marginal  pdf  of  the  noise  within  the  class  of  (38);  /  has  been  evaluated  by 
Huber  in  [7];  moreover,  satisfies  (44),  for  all  j.  Then  this  test  is  least-favorable  for  the 
error  probabilities  under  the  two  hypotheses,  that  is, 

Pk(g,f*)  <  Pk(gj*)  ,  for  k  =  0,  1  ,  (56) 

for  any  f*  =  (f  ,{f^}JL\)  with  /  in  the  class  of  (38)  and  satisfying  (43).  In  (56), 
Pk  (g ,/  * ),  the  error  probabilities  under  mismatch,  are  as  defined  in  Proposition  4  by  (50a)- 
(50b). 

Remark  10:  This  Proposition  holds  under  the  same  conditions  as  Proposition  4. 

Proof:  The  sequence  of  steps  necessary  for  Proving  (56)  is  similar  to  that  used  for  the  proof  of 
Proposition  2,  but  the  individual  steps  differ.  Here  we  sketch  the  proof  and  cite  the  points  that 
are  different. 


On  the  basis  of  (42)-(45)  we  can  derive  the  equality 
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&o  =  <?(g  ,/*)  =  (  1  +  2  R  )o2(g ,/)  (57) 

where  a2(g ,/ )  depends  only  on  /  and  the  inequality 

+2R)a2(g,f)  (58) 

for  all  /  with  bivariates  in  (43),  where  a2(g  / )  depends  only  on  /  belonging  to  (38). 
Using  these  two  results  and  the  fact  that  Pk(g,f  *)  of  (50a)-(50b)  is  an  increasing  function  of 
c?o  (#/*)>  and  thus  of  o2^  ,/*),  we  deduce  that  (56)  is  equivalent  to 

pk(gJ)  ^  pk(gJ )  -  for  k  =  0,  1  ,  (59) 


where  pk(g,f )  is  obtained  from  (50a)-(50b)  by  using  the  right-hand  member  of  (58)  in  (53). 
Equation  (59)  involves  only  the  marginal  pdfs  /  and  /  and  this  simplifies  considerably  the 
final  part  of  the  proof. 


To  prove  (59)  we  observe  that  (50a)  is  an  increasing  function  of  the  normalized  drift  c0, 
whereas  (50b)  is  a  decreasing  function  of  the  drift  clt  defined  as  ck  =  2ft*(|,/)/c7o(g  ,/*),  for 
k  =  0,  1,  and  for  the  worst  case  ck  =  2jl*(g,/)  /  o2(g  ,f*).  Removing  the  dependence  in  the 
observations  through  the  bounds  of  (43)  and  (44)  implies  that  c0  =  -1  and  =  1.  Thus  (59) 
becomes  equivalent  to 


co 


and 


(1  +  2  RW(gJ)  <  _  1  . 

(1+2  R)o\g,f)  ° 


(60a) 


[2p9(g,/)-(l9](l+2R)o2(g,/)  >  ^  =  ^ 

Ae(l+2 RW(g,f)  ~  Cl 


(60b) 


These  inequalities  are  satisfied,  if 
°2(i,/)  ^  o2(g,f) 


and 


(61a) 
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Ae(£ >/)  -  Ae(£ ./)  =  Ae  >  0  .  (61b) 

The  last  two  conditions  can  be  put  in  the  following  equivalent  forms 

\g\x)f(x)dx  >jg\x)f(x )dx  (62a) 

and 

-  J| ( x )f  '(x)dx  >-  jg (x )f'(x)dx  >0  (62b) 

Obtaining  (62a)  from  (61a)  is  trivial.  To  obtain  (62b)  from  (61b)  we  subtract  MoC?/)  from 
the  left-hand  side  and  (i0(<?  -/ )  from  the  right-hand  side  of  (61b).  Since  both  these  terms  are  0 
(due  to  g  being  an  odd  and  any  /  in  (38)  being  an  even  function)  we  obtain 

\§(x)[f{x  -  0)  -f(x)]dx  >  jg(x)[f(x  -  Q)-f(x)]dx 

which  after  dividing  by  0  >  0  and  taking  the  limit  as  0  — >  0  yields  (62b). 

Notice  that  the  conditions  (62a)-(62b)  are  satisfied  for  all  /  in  the  class  (38)  if  /  is  the 
least-favorable  pdf  for  the  ARE  derived  in  [7]  and  g  is  given  by  (55).  Indeed,  in  the  proof  of 
[7]  the  ARE  was  robustified  by  minimzing  its  numerator  and  maximizing  its  denominator;  the 
former  corresponds  to  (62b)  and  the  latter  to  (62a)  in  our  situation.  The  inequality  Ae  >  0  or 
equivalently  -  jg(x)f'(x)dx  >  0  follows  from  assumption  (41).  This  completes  the  proof  or 
Proposition  5. 

Remark  11:  Proposition  5  holds  not  only  for  classes  of  the  form  (38)  for  the  marginal  pdfs  of 
the  noise  sequence,  but  also  for  any  other  class  of  pdfs  for  which  the  numerator  and  denomina¬ 
tor  of  the  ARE  are  respectively  minimized  and  maximized  simultaneously  by  the  same  least- 
favorable  pdf  /  in  the  class.  For  example,  for  the  total  variation  uncertainty  class,  also  intro¬ 
duced  in  [7],  (62a)-(62b)  hold  and  so  do  Propositions  5  and  6. 
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Finally,  we  prove  the  second  main  result  of  this  section,  which  pertains  to  the  robustness 
of  the  sequential  memoryless  test  of  (46)  with  respect  to  the  expected  sample  numbers. 

Proposition  6:  Suppose  that  the  same  sequential  test  as  in  Proposition  5  is  employed.  A  nota¬ 
tion  identical  to  that  of  Propositions  4  and  5  is  used.  Assume  that  conditions  (62a)-(62b)  and 
assumptions  (A5-l)-(A5-2)  hold.  Then  the  sequential  test  of  (46)  is  minimax  robust  for  the 
expected  sample  numbers  under  the  two  hypotheses,  that  is, 

Ek{N  I U  )  ^  Ek{N\g,f  }  ,  for  k  =  0,  1  ,  (63a) 

and 

E1{Nlg,f}+E0{N\g,f)<El{N\g,f)  +  E0{N\g,f)<El{N\g,f}+E0{N\gJ}  (63b) 

for  all  marginal  pdfs  /  in  the  class  (38)  with  bivariates  satisfying  (43)  and  any  measurable 
function  g  satisfying  E  {g2^)}  <  °°;  /  is  the  pdf  derived  by  Huber  in  [7]  and  has,  in  this 
case,  bivariates  satisfying  (44),  for  all  j.  The  expected  sample  numbers  under  mismatch 
Ek{N  \g  ,f]  are  as  defined  by  (50a)-(50b)  of  Proposition  4. 

Remark  12:  In  contrast  to  Remark  6  following  Proposition  3,  the  choice  of  the  linear  test 
statistic  in  (46)  and  of  g  solving  the  linear  integral  equation  of  (47)  [the  solution  being  given 
by  (55)]  does  not  restrict  the  validity  of  the  right-hand-side  inequality  in  (63b).  This  is  due  to 
the  fact  that  the  test  test  statistic  of  (46)  is  optimal  within  the  class  of  memoryless  structures. 

Remark  13:  Remarks  7  and  8  following  Proposition  3  are  also  valid  here.  In  particular,  results 
similar  to  the  asymptotic  results  of  (33a)-(33b)  regarding  the  asymptotic  speeds  of  the  SPRT 
hold  for  the  situation  described  by  Proposition  6. 

Proof:  The  right-hand-side  in  the  inequality  (63b)  follows  from  the  fact  that  g  is  selected  to 
optimize  the  sum  of  the  average  sample  numbers  under  the  two  hypotheses  (for  desirable  error 
probabilities  smaller  than  &  and  (5)  of  the  sequential  test  (46),  when  the  noise  pdf  is  /  and  the 
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dependence  of  the  observations  has  been  removed  through  the  bounds  of  (43)  and  (54).  Under 
these  conditions,  g  is  given  by  (55)  and  maximizes  the  ARE  for  the  matched  worst  case.  Also 
refer  to  the  beginning  of  Section  2.1  of  Part  I  [1]. 

The  left-hand-side  inequality  in  (63a)  is  established  as  follows.  We  prove  the  result  for 
k  =  1;  a  similar  proof  holds  for  k  =  0,  as  well.  From  (51b)  and  the  definition  of  (£>(x,y\x)  in 
(19)  we  can  rewrite  the  left-hand-side  of  (63a),  for  k  =  1,  in  the  equivalent  form 

E  r  v  |  *  f  i  ln[(l  -  fl)/&]  -  P  ln[(l  -  &)(1  -  $)/(&$)]  ln[(l  -  fl)/&] 

1  ’  Pud./)  Pud./) 

^  ln[(l  -  $)/&]  _  ln[(l  -  0)/6tl  -  0  ln[(l  -  fi)(l  -  0)/(ft0)]  r 

-  „  t,  -  - -  *: - -  “£il N'gJ>  (64) 

P:id/)  Fid,/) 

Regarding  the  denominators  of  (64)  we  can  apply  (61b)  to  obtain 


Pud./)^Pid>/)  = 


Pe 


2(1+2 RX^d/) 


—  >  0 


(65a) 


and 


Pod/)  =  Pod,/)  = 


P-9 

2(1  +2R  )cs2(g ,/) 


<  0  . 


(65b) 


After  establishing  (65a),  we  follow  for  the  proof  of  (64)  similar  arguments,  as  we  did  for  the 
proof  of  (34)  for  Proposition  3.  We  do  not  repeat  them  here. 

Remark  14:  The  robust  sequential  test  of  (46)  which  uses  a  test  statistic 


c  _  P0 


»  ll0 

Xra)-—» 

i=l  z 


with  g  given  by  (55),  p.0  given  by  (49),  and  6q  given  by  (42) 


upon  substitution  for  g,  /,  and  /^  from  (44),  is  easier  to  implement  than  the  sequential  test 
of  [4],  The  latter  first  estimates  0  by  an  Af -estimator  for  each  step  n  of  the  SPRT;  this 


involves  solving  the  nonlinear  equation  £/(X,-  -  0„)  =  0  for  the  estimate  0„,  where  l(x)  is  an 

i= 1 
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appropriate  nonlinearity  and  (X,  }/L,  are  the  n  observations  collected  untill  the  «-th  step  of  the 
sequential  test.  Then  it  performs  an  SPRT  based  on  the  likelihood  ratio  of  0„ . 

4.  Conclusions 

In  this  paper,  we  robustifled  sequential  tests  based  on  memoryless  nonlinearities.  We 
developed  robust  sequential  tests  for  (i)  memoryless  dicrimination  from  two  arbitrary  stationary 
m  -dependent  or  mixing  observations,  and  (ii)  memoryless  detection  of  a  weak  signal  in  addi¬ 
tive  stationary  m  -dependent  or  mixing  noise.  In  both  cases,  the  marginal  pdfs  of  the  two  obser¬ 
vation  sequences  or  of  the  noise  sequence  belong  to  uncertainty  classes,  such  as  e-contaminated 
classes  and  total  variation  classes,  whereas  the  bivariate  pdfs  satisfy  bounds  on  the  correlation 
coefficients  of  time-shifts  of  the  observation  sequences  or  the  noise  sequence. 

The  robust  sequential  tests  derived  have  the  form  of  (5)  for  the  discrimination  problem 
and  of  (46)  for  the  problem  of  detecting  a  weak  signal.  They  consis  of  SPRTs  based  on  sim¬ 
ple  linear  test  statistics  involving  nonlinearities  g  associated  with  the  least-favorable  pdf  in  the 
uncertainty  class  of  marginal  pdfs  and  with  bivariates  which  achieve  the  aforementioned 
bounds  on  the  correlation  coeffcients  of  time-shifts  of  the  observation  or  noise  sequences.  In 
the  case  of  detection  of  weak  signals,  the  test  of  (46)  is  considerably  easier  to  implement  that 
the  test  proposed  in  [4]  for  the  i.i.d.  case  (refer  to  Remark  14). 

Coupled  with  the  results  of  the  first  part  of  this  study  (see  [1]),  which  derived  optimal 
sequential  discrimination  schemes  based  on  memoryless  nonlinearities  and  established  their 
superiority  to  the  conventional  i.i.d.  discriminators  and  to  fixed-sample-size  memoryless 
schemes  for  environments  characterized  by  strongly  correlated  observations,  this  paper 
strengthened  further  the  usefulness  of  these  sequential  tests  by  establishing  that  they  can  be 
rendered  relatively  immune  to  statistical  uncertainty  within  certain  popular  classes  of  distribu¬ 


tions. 
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Appendix  A 

Review  of  Uncertainty  Models  and  of  Basic  Results  from  the  Theory  of  Minimax  Robustness 

In  this  Appendix  we  present  a  review  of  the  uncertainty  models  based  on  2-altemating 
capacities  and  of  Huber’s  basic  theory  of  minimax  robustness  associated  with  these  models. 

Definition:  A  positive  finite  set  function  u  on  a  sample  space  Q  with  a  complete,  separ¬ 
able,  and  metrizable  topology  and  associated  Borel  field  F  is  called  a  2-alternating  capacity  if 
it  is  increasing,  continuous  from  below,  continuous  form  above  on  closed  sets,  and  satisfies  the 
conditions  \)(0)  =  0,  and 

x>(A  uB )  +  t)(A  nB )  <  u(A )  +  n(B ). 

Suppose  now  that  M  is  the  class  of  measures  on  (£2,  F)  and  meM  is  a  measure.  Con¬ 
sider  the  uncertainty  class  which  is  determined  by  the  2-altemating  capacity  v  as  follows 

Af„  =  [meM  I  m(A)  <  o(A),  VAeF ,  m(Q )  =  (A-l) 

When  £2  is  compact  several  popular  uncertainty  models  like  £-contaminated  neighborhoods  [7], 
total  variation  neighborhoods  [7],  band  classes  [8],  and  p-point  classes  [9],  are  special  cases  of 
this  model.  The  most  general  uncertainty  classes  of  the  form  (A-l)  are  determined  by  general¬ 
ized  capacities  of  the  form  o(A)  =  Jug(A)|a.(da )  for  VAeF  (see  [10]),  where  va  is  a  capacity 
when  conditioned  on  the  parameter  vector  a  and  |i  is  the  measure  induced  by  the  joint  distri¬ 
bution  of  these  parameters.  Fundamental  properties  of  the  uncertainty  model  (A-l)  have  been 
studied  by  Huber  and  Strassen  (see  [11]).  We  state  the  relevant  properties  as  Lemma  1. 

Lemma:  Suppose  u0  and  vq  are  2-altemating  capacities  on  (£2,  F )  and  (M 0,  M  i)  are  the 
uncertainty  classes  determined  by  ( x>0 ,  v{)  as  in  (1).  In  that  case,  there  exists  a  Borel- 
measurable  function  [0,  H  such  that  the  average  (Bayes)  risk  0t>o(A )  +  Oi(Ac)  is 

minimized  for  A*  =  [u^  >  0],  i.e., 

0uo  ({jc^  >  0})  +  Kj  ({Ttu  <  0})  <  0no(A)  +  n1(Ac)  (A-2) 

for  all  AeF  and  0  >  0;  A  denotes  an  arbitrary  decision  test,  Ac  its  complement,  0  can  be 
interpreted  as  the  ratio  of  the  prior  probabilities  of  the  two  hypotheses  H0  and  H  1?  and  [iru>0] 
can  be  interpreted  as  the  likelihood  ratio  test  for  "u0  versus  "Oj.  Clearly,  (A-2)  together  with 
(A-l)  imply  that 

0mo([Tcv  >  0})  +  m,([ 7^  <  0})  <  0\)o([7tv  >  0})  +  ^({t^  <  0})  <  0no(A)  +  t>i(Ac)  (*) 

for  all  m0  e  M0,  mt  e  M\,  0  >  0,  and  AeF;  this  inequality  establishes  the  minimax  robust¬ 
ness  of  the  test  based  on  kv.  Furthermore,  there  exist  measures  (m0>  "h)  in  Afox  such 
that 

m0  ({^  >  9})  =  %  ({TCd  >  9})  ^  mo  ({7tv  >  0})  (A-3) 

rhx  ({7iv  <  0})  =  t)!  ({t^  <  0})  >  m x  ({tcv  <  0})  (A-4) 

for  all  0  >  0;  these  inequalities  imply  that  is  stochastically  largest  over  M0  under  m0  and 

stochastically  smallest  over  Mx  under  mx.  The  quantity  itv  is  sometimes  termed  the  Huber- 

Strassen  derivative  of  the  classes  M0  and  Mx,  is  denoted  by  dvx/dv 0,  is  given  by 
=  dri%x/dfftQ,  and  is  unique  a.e.  [mx  +  «i0];  it  plays  the  role  of  the  worst-case  likelihood 
ratio  for  the  two  uncertainty  classes.  The  dominance  properties  (A-3)-(A-4)  establish  the 
existence  of  measures  in  the  classes  M0  and  M  x  that  achieve  the  upper  values  provided  by  t)0 


and  n,  for  sets  of  the  form  {itv  <  0}  and  their  complements.  The  measures  (m0,m1)  are 
termed  the  least-favorable  measures  over  M0x  M\.  For  the  aforementioned  four  uncertainty 
classes  (e-contaminated  mixtures,  total  variation  classes,  band  classes,  and  p  -point  classes), 
which  are  special  cases  of  the  general  model  (A-l)  when  Q  is  compact,  the  least-favorable 
pairs  of  probability  measures  (actually,  the  corresponding  probability  density  functions)  have 
been  derived  in  closed  form  ([7]-[9]).  Depending  on  the  form  of  the  joint  distributions  of  the 
parameter  vector  a  under  the  two  hypotheses,  even  the  most  general  uncertainty  model  that  can 
be  obtained  from  (A-l),  that  is,  when  u(A );  =  j\)ait(A)p;(da)  for  VAeF  are  the  generalized 
capacities  of  [10]),  can  result  in  closed  form  expressions  for  the  least- favorable  probability 
measures. 


Example:  Consider  the  e-contaminated  mixture  uncertainty  classes  of  probability  meas¬ 
ures  described  in  [7] 


Mj  =  I [rrije  M  \mj(A)  -  (\-Zj)mf{A)  +  Zjfhj(A)  for  all  AeF ,  rhj(£l)  -  =  1)  ,(A-5) 

for  j  =  0,  1,  which  are  determined  by  the  known  nominal  probability  measures  m$  and  mf 
and  the  degrees  of  uncertainty  and  e1  (0  <  e;  <  1  for  j  =  0,  1)  the  unknown  probability 
measures  rhj  are  allowed  to  take  any  arbitrary  values.  This  uncertainty  class  is  appropriate  for 
modeling  situations  in  which  the  probability  measures  governing  the  observations  are  convex 
combinations  of  known  probability  measures  and  arbitrary  probability  measures.  Then  lire 
associated  2-altemating  capacities  are 


f(l-e)m/(A)  +  er 

VA)=  1o, 


A  *0 
A  =  0 


(A-6) 


and  the  least-favorable  distributions  are 


drh^dX  = 


(l-e0)  drriQ  IdX  , 
[(l-e0)  lco)]dm?/dX  , 


dm  i  ldm.Q  <  c0 
c0  <  dm i  Idm® 


(A -7 a) 


drhyld'k  = 


(1-ej)  dm  i  /dX  , 

■« 

CiO-E^  dm^ldX  , 


cx  <  dm i  /dm® 
dm  f  /dm  q  <  c  j 


(A-7b) 


where  X  is  the  Lebesgue  measure  and  0<c1<c0<°°  are  constants  such  that 
m^C!)  =  m0(Q)  =  1,  and  the  Huber-Strassen  derivative  7^  has  the  form 


=  dm  \ldrh  q  = 


2l£l 

l-£o 


min{c0,  max{c!,  dm\/dtm o  } } 


(A-8) 


which  consists  of  a  censored  version  of  the  nominal  likelihood-ratio  dm  f  l dm  q  . 

Recently,  in  [12]  the  dominance  properties  (A-3)  and  (A-4)  were  exploited  to  extend  the 
Huber-Strassen  theory  to  more  general  objective  functions  than  the  Bayes  risk  of  (A-2).  We 
cite  the  following  proposition  from  [12]  without  the  proof  provided  there  as  Lemma  2. 

Lemma  2:  Suppose  that  the  measures  (m0,  m{)  on  (Q,  F)  belong  toM0xM,  charac¬ 
terized  by  (A-l)  and  that  x  is  a  real  variable: 

(i)  If  one  of  the  following  situations  holds: 

(a)  both  g( kv)  and  h(x)  are  nonnegative,  increasing  functions  of  (the  Huber-Strassen 


derivative)  and  x ,  respectively; 

(b)  gOu)  is  a  nonnegative,  decreasing  function  of  7tv  and  h(x)  is  a  nonpositive  and  increasing 
function  of  x ; 

(c)  g(u0)  is  a  nonpositive,  increasing  function  of  JLu  and  h(x)  is  a  nonnegative  and  decreasing 
function  of  x ; 

(d)  both  g{i t^)  and  h(x)  are  nonpositive  and  decreasing  functions  of  ttw  and  x,  respectively; 
then 

JqS  (kv(x  ))h  (x  )m  o (dx )  <  (kv(x  ))h  (x  )m  0(dx )  (A-9) 

\ng(K»(x))h(x)ml(dx)  >  jng(Kv(x))h(x)m{(dx)  (A- 10) 

(ii)  If  one  of  the  following  situations  holds: 

(a)  both  g(nv)  and  h(x)  are  nonnegative,  decreasing  functions  of  izv  and  x,  respectively; 

(b)  g (X,)  is  a  nonnegative,  increasing  function  of  7iv  and  h(x)  is  a  nonpositive  and  decreasing 
function  of  x ; 

(c)  £  (tt-u)  is  a  nonpositive,  decreasing  function  of  and  h(x)  is  a  nonnegative  and  increasing 
function  of  x ; 

(d)  both  g(i c^)  and  h  (x)  are  nonpositive  and  increasing  functions  of  and  x,  respectively; 
then 

(njx  ))h  (x  )m  o {dx )  >  (kv(x  ))h  (x  )m  0(dx )  (A- 1 1 ) 

Jng  (Xb(x  ))h  (x)m  i(dx  )  <  j^g  (tcv(x ))h  (x )mx{dx)  (A- 12) 

where  mQ  and  ml  arc  singled  out  by  Lemma  1. 

Remark  1:  If  either  g(x)  =  1  or  h(x)  =  1  for  all  x,  i.e.,  one  of  the  two  functions  g  or  h  is 
absent  from  the  integrands  of  (A-9)-(A-12),  the  inequalities  in  (A-9)-(A-12)  still  hold;  in  this 
case  the  nonnegativity  of  the  function  involved  is  not  a  necessary  condition. 

Remark  2:  Lemmas  1  and  2  hold  even  if  the  2-altemating  capacity  i)0  is  itself  a  measure.  In 
this  case,  the  uncertainty  class  M0  has  a  single  element  u0. 

Remark  3:  If  the  nominal  measures  m £  characterizing  the  uncertainty  class  (e.g.,  the  e- 
contaminated  or  total  variation  classes  or  the  upper  and  lower  bounds  in  the  case  of  the  band 
class)  are  absolutely  continuous  with  respect  to  the  Lebesgue  measure  X  on  (QJ7),  that  is 
fflo<  'X,  then  for  the  least-favorable  measures  singled  out  by  Lemma  1  m0  «  X  and  mx  «  X 
as  well.  In  other  words,  if  the  nominal  distributions  have  densities  (pdfs),  so  do  the  least- 
favorable  ones,  although  many  elements  of  the  uncertainty  class  in  (A-l)  may  not  have  pdfs. 


Appendix  B 


Establishing  the  Sufficient  Conditions  (28a)-(28b)  for 
the  Minimax  Robustness  of  the  Sequential  Test  of  (5) 


In  this  Appendix  we  establish  (28a)-(28b),  the  sufficient  conditions  for  (26)  which 
expresses  the  minimax  robustness  (actually  least-favorability)  of  the  sequential  test  of  (5)  for 
the  error  probabilities. 

Since  =  1/[Atcv+1]2  >  0,  g  is  an  increasing  function  of  tcv.  Consequently,  from 

di Cv 

(A-3)  of  Appendix  A  we  obtain 

ft(| ,F0)  =  jg (x)dF 0(x)  <  \g(x)dF0(x)  =  p(g/0)  (B-la) 

and  from  (A-4)  we  obtain 

ftdvFi)  =  \§{x)dFx(x)  >  \g(x)dFl(x)  =  pd/,)  (B-lb) 

which  establish  the  desirable  inequalities  involving  the  means  in  conditions  (28a)  and  (28b). 

The  corresponding  proof  for  the  variances  in  (28a)  and  (28b)  is  more  complicated.  We 
actually  show  that 

Jd(x)-pd/’0)]2dF0(x)  <  J[g(x)-p(g,Fo)]2^oCO  <  Jd(x)-pd/o)]2^od)  (B-2a) 

and 

j[g (x )-p(g  ,F OfdF^x )  <  J[g (* Hid / 1)]2^ i(x )  <  jig  (x Mid / OfdFiix )  .  (B-2b) 

The  left-hand-side  inequalities  in  (B-2a)-(B-2b)  follow  from  an  application  of  the  minimum 
variance  principle  of  estimating  a  random  variable  g(X)  by  its  mean  pd>F*)  under  cdf  Fk. 
The  right-hand-side  inequalities  in  (B-2a)-(B-2b)  follow  from  the  dominance  properties  (A-3) 
and  (A-4)  of  Appendix  A,  provided  that  the  function  [g  (x )  -  p(g,F0)]2  is  increasing  in  the 
Huber-Strassen  derivative  kv  and  the  function  [g(x)  -  p(g>^i)]2  is  decreasing  in  kv.  These 
last  facts  are  established  as  follows.  We  notice  that 


f\(x)fQ(x)dx 
Af  i(x)  +  /0(x) 


and  p(g  / 1)  =  J' 


/  2  (x  )dx 


Afx{x)+f0(x) 


(B-3) 


which  implies 


g(x)-pd/0)  = 


[l  -  Ap(g,F0)]Kv(x)  -  p(g/p) 
A%v{x)  +  1 


(B-4a) 


and 


3 

dxv 


[g(x) 


_  .  2{[1  -Ap(g/0)K(^)-ltd/o)} 

M-(g^o)]  \= - - - — — 7^ - 

I  [A  jcv  (x )  +  1  ]3 


>  0 


(B-5a) 


if  the  conditions 

Pd  >F0)  <  0  and  A  >  0 
hold.  Similarly, 


(A2) 


(B-4b) 


#(*)  -  = 


[1  -  AjiCg,^!)]^)  - 

A  Kv(x)  +  1 


and 


3jt„ 


\[g(x)-ii(£,Fl)]2h 


2{[i 


[Akv(x)  +  if 


<  0 


if  the  following  conditions  hold 

lt(g/!)>0  and  >  1  . 


(B-5b) 

(A3) 


